Efficient Text Compression Using Special Character Replacement and Space Removal
نویسنده
چکیده
In this paper, we have proposed a new concept of text compression/decompression algorithm using special character replacement technique. Moreover after the initial compression after replacement of special characters, we remove the spaces between the words in the intermediary compressed file in specific situations to get the final compressed text file. Experimental results show that the proposed algorithm is very simple in implementation, fast in encoding time and high in compression ratio and even gives better compression than existing algorithms like LZW, WINZIP 10.0 and WINRAR 3.93.
منابع مشابه
Efficient Lossless Colour Image Compression Using Run Length Encoding and Special Character Replacement
Image compression, in the present context of heavy network traffic, is going through major research and development. The lossless compression techniques, presently in practise, follow three basic paradigms – character repetition removal, frequency measurement-encoding and dictionary maintenance. In the proposed method, the character repetition removal and dictionary maintenance concepts were in...
متن کاملA Novel Approach to Compress Centralized Text Data using Indexed Dictionary
Data compression is very important feature in terms of saving the memory space. In this proposal, an indexed dictionary based compression is used for text data, where the word’s reference in dictionary is used in compression. This approach is not file based; a common dictionary is used for compression. Which contains the words, the position of the word in dictionary is one of the key parts of e...
متن کاملDomain Specific Hierarchical Huffman Encoding
In this paper, we revisit the classical data compression problem for domain specific texts. It is well-known that classical Huffman algorithm is optimal with respect to prefix encoding and the compression is done at character level. Since many data transfer are domain specific, for example, downloading of lecture notes, web-blogs, etc., it is natural to think of data compression in larger dimen...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملMorphological Analysis and Diacritical Arabic Text Compression
Morphological analysis of Arabic words allows decreasing the storage requirements of the Arabic dictionaries, more efficient encoding of diacritical Arabic text, faster spelling and efficient Optical character recognition. All these factors allow efficient storage and archival of multilingual digital libraries that include Arabic texts. This paper presents a lossless compression algorithm based...
متن کامل